Using FBRef Data in R

Introduction

In B1700 you have started to learn the basics of R and in the previous practical for B1701 you learned how to load in multiple files at ones. However, there are occasions when your data is not stored in flat files and you may want to pull data from an online database or websites. Going into how to do this without using predefined R library’s is beyond the aims of this course, however, there are many R library’s available which can help you pull data from the web. Examples are:worldfootballR, baseballr, hoopR, SwimmeR, StatsBomb etc. All these packages come with instructions as to how to use them to pull relevant data from a variety of sources and it is worth having a look at some of these. However, for this practical we will use the worldfootballR package. worldfootballR pulls football data from FBRef, Transfermarket, Understat, and fotmob. We will focus on FBRef data today but it’s worth exploring the data pulled from the other websites.

Installing and loading packages

To start, begin by installing the worldfootballR via install.packages(“worldfootballR”)

Next we need to load this package as well as tidyverse.

Show the code
# Install packages
library(worldfootballR)
library(tidyverse)

Loading data

Once you have successfully installed and loaded all the necessary packages, you can begin reading your data.

worldfootballR uses several functions to load data in to R. You can find a detailed explanation of all functions here. Some of the main functions we will use today are:

  1. fb_match urls() is used to get the match urls for the correct league and season

  2. fb_advanced_match_stats() is used to extract player or team stats for each match in the relevant league. You can extract summary, passing, passing type, defensive, possession, miscellaneous and goalkeeping stats.

Loading team data - domestic competitions

First up we want to get an overview of the team performance data. We will focus on season 2022 to 2023 and 1st tier men’s competitions in Spain (La Liga).

We will first get all the match urls for all La Liga matches in 2022/2023. Once we have these we will use them to load in the different statistics available for each player and last we will merge the individual statistics tables into a combined table. Note the code below scrapes all the data of the internet and will take a while, patience is your friend.

Show the code
# Get match Urles
MatchUrls <- fb_match_urls(country = "ESP", gender = "M", season_end_year = 2022, tier="1st")

# Get different team stats
TeamStats2023SumDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "summary", team_or_player = "team")
TeamStats2023PassDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "passing", team_or_player = "team")
TeamStats2023PassTDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "passing_types", team_or_player = "team")
TeamStats2023DefDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "defense", team_or_player = "team")
TeamStats2023PosDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "possession", team_or_player = "team")
TeamStats2023MiscDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "misc", team_or_player = "team")
TeamStats2023KeeperDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "keeper", team_or_player = "team")

# Combine the team stats data into one DF
CombinedTeamDataDF<-merge(TeamStats2023SumDF,TeamStats2023PassDF, all=TRUE)
CombinedTeamDataDF <- CombinedTeamDataDF %>%
    merge(TeamStats2023PassTDF,all=TRUE)%>%
    merge(TeamStats2023DefDF, all=TRUE)%>%
    merge(TeamStats2023PosDF, all=TRUE) %>%
    merge(TeamStats2023MiscDF, all=TRUE) %>%
    merge(TeamStats2023KeeperDF, all=TRUE)

Loading player data - domestic competitions

Next we want to get an overview of the player data. We will again focus on season 2022 to 2023 and 1st tier men’s competitions in Spain.

We will use the match urls we received in the previous step and use them to load in the different statistics available for each player and last we will merge the individual statistics tables into a combined table. Note the code below scrapes all the data of the internet and will take a while, patience is your friend.

Show the code
# Individual player stats
PlayerStats2023SumDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "summary", team_or_player = "player")
PlayerStats2023PassDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "passing", team_or_player = "player")
PlayerStats2023PassTypesDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "passing_types", team_or_player = "player")
PlayerStats2023DefDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "defense", team_or_player = "player")
PlayerStats2023PosDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "possession", team_or_player = "player")
PlayerStats2023MiscDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "misc", team_or_player = "player")
PlayerStats2023KeeperDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "keeper", team_or_player = "player")

# Combine individual player stats
CombinedPlayerDataDF<-merge(PlayerStats2023SumDF,PlayerStats2023PassDF, all=TRUE)
CombinedPlayerDataDF <- CombinedPlayerDataDF %>%
    merge(PlayerStats2023PassTypesDF,all=TRUE)%>%
    merge(PlayerStats2023DefDF, all=TRUE)%>%
    merge(PlayerStats2023PosDF, all=TRUE) %>%
    merge(PlayerStats2023MiscDF, all=TRUE) %>%
    merge(PlayerStats2023KeeperDF, all=TRUE)

Loading team data - international competitions

So far we have focussed on national competitions. If we want to extract data from non-domestics competitions (e.g. champions league or world cups) we will need to use slightly different code. Instead of using the country we will need to locate the relevant url for this competition. You can do so by going to https://fbref.com/en/comps/, clicking on the relevant competition and copying the url. For example if I was after champions league data I would copy the following url: https://fbref.com/en/comps/8/history/Champions-League-Seasons. Now let’s see if we can get the match urls and team summary statistics for the most recent men’s world cup.

Show the code
MatchUrls <- fb_match_urls(country = "", gender = "M", season_end_year = 2022, non_dom_league_url = "https://fbref.com/en/comps/1/history/World-Cup-Seasons")

WorldCupTeamDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "summary", team_or_player = "team")

WorldCupPlayerDF <- fb_advanced_match_stats(match_url = MatchUrls, stat_type = "summary", team_or_player = "player")

Saving your data

Now we have created three tables with player and team stats for two different competitions we can save these as RData. Doing this immediately means we do not have to go through the tedious process of scraping all data of the internet again.

Show the code
saveRDS(CombinedTeamDataDF, file="C:/Users/wkb14101/OneDrive - University of Strathclyde/MSc SDA/R Projects/B1701/data/Saved Data/CombinedTeamData.rds")
saveRDS(CombinedPlayerDataDF, file="C:/Users/wkb14101/OneDrive - University of Strathclyde/MSc SDA/R Projects/B1701/data/Saved Data/CombinedPlayerData.rds")
saveRDS(WorldCupTeamDF, file="C:/Users/wkb14101/OneDrive - University of Strathclyde/MSc SDA/R Projects/B1701/data/Saved Data/WorldCupTeamData.rds")
saveRDS(WorldCupPlayerDF, file="C:/Users/wkb14101/OneDrive - University of Strathclyde/MSc SDA/R Projects/B1701/data/Saved Data/WorldCupPlayerData.rds")

Creating player averages or team averages

Exercises

Exercise 1: Make sure worldfootballR and tidyverse are installed and loaded.

Show the answer
# Install packages
library(worldfootballR)
library(tidyverse)

Exercise 2: Load all summary and possession team data for the 2018 National Women’s Soccer League (USA).

Show the answer
MatchUrls <- fb_match_urls(country = "USA", gender = "F", season_end_year = 2018, tier="1st")

NWSLTeamSummaryDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "summary", team_or_player = "team")
NWSLTeamPossessionDF <- fb_advanced_match_stats(MatchUrls = Match_url, stat_type = "possession", team_or_player = "team")

Exercise 3: Merge your two dataframes together and call it NWSLTeamDF.

Show the answer
NWSLTeamDF <- merge(NWSLTeamSummaryDF,NWSLTeamPossessionDF)

Saving your data

Exercise 4: Save your data file using writeRDS()

Show the answer
saveRDS(NWSLTeamDF, "C:/Users/wkb14101/OneDrive - University of Strathclyde/MSc SDA/R Projects/B1701/data/Saved Data/NWSLTeamData.rds")